Skip to content

yeast: Align query semantics more closely with tree-sitter#21810

Open
tausbn wants to merge 3 commits intomainfrom
tausbn/yeast-forward-scan-queries
Open

yeast: Align query semantics more closely with tree-sitter#21810
tausbn wants to merge 3 commits intomainfrom
tausbn/yeast-forward-scan-queries

Conversation

@tausbn
Copy link
Copy Markdown
Contributor

@tausbn tausbn commented May 7, 2026

In particular:

  • Wildcards without parentheses (i.e. bare _) now match any node -- named or unnamed, just as in tree-sitter. This also applies to fields: (foo bar: _ @baz) is now valid.
  • Unnamed matches now skip over nodes that don't match. Previously, if a node had the shape (foo "bar" "baz"), then the query (foo "baz") would fail because it would try to match against "bar". Now it skips over "bar" and continues to try to match.

@tausbn tausbn added the no-change-note-required This PR does not need a change note label May 7, 2026
@tausbn tausbn marked this pull request as ready for review May 7, 2026 14:03
@tausbn tausbn requested a review from a team as a code owner May 7, 2026 14:03
Copilot AI review requested due to automatic review settings May 7, 2026 14:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the shared/yeast query language and matcher to behave closer to tree-sitter query semantics, particularly around unnamed tokens and positional child matching.

Changes:

  • Extend query syntax to support bare _ (match any node, including unnamed) and bare string literals (shorthand for ("...")), including in field positions.
  • Update positional matching to support forward-scan behavior (skipping over non-matching children rather than requiring exact positional alignment).
  • Refine schema kind import to use canonical tree-sitter IDs for unnamed kinds, aligning schema resolution with how the AST visitor assigns kind IDs.
Show a summary per file
File Description
shared/yeast/tests/test.rs Adds regression tests for capturing unnamed tokens, bare _, field-position sugar, and forward-scan behavior.
shared/yeast/src/schema.rs Imports canonical IDs for unnamed kinds and tracks kind names more consistently.
shared/yeast/src/query.rs Adds match_unnamed to wildcard nodes and implements forward-scan positional matching.
shared/yeast/doc/yeast.md Updates user-facing query-language documentation to describe the new wildcard/token behavior.
shared/yeast-macros/src/parse.rs Extends proc-macro parsing to accept bare _ and bare literals; allows intermixing fields and positional patterns.
shared/yeast-macros/src/lib.rs Updates macro-level syntax documentation to reflect the new query forms and ordering rules.

Copilot's findings

  • Files reviewed: 6/6 changed files
  • Comments generated: 3

Comment thread shared/yeast/doc/yeast.md Outdated
Comment thread shared/yeast-macros/src/parse.rs Outdated
Comment thread shared/yeast/src/query.rs
tausbn added 3 commits May 7, 2026 15:08
Three improvements to the query parser, all aimed at allowing query
patterns to refer to unnamed tokens:

1. Bare-literal capture: `"=" @op` now captures the unnamed `=` token,
   matching the parenthesized form `("=") @op`. Previously the literal
   branch in parse_query_list skipped the maybe_wrap_capture call, so
   the `@op` was a leftover token and would error.

2. Bare `_` matches any node, named or unnamed. Previously bare `_` and
   `(_)` both produced QueryNode::Any with the same matches_named_only
   behaviour, so bare `_` would skip unnamed children. Now Any carries a
   match_unnamed flag: false for `(_)` (named-only, tree-sitter default)
   and true for bare `_` (any node).

3. Named fields and bare child patterns may be intermixed in any order.
   Previously, once parse_query_fields saw a bare pattern it would stop
   accepting named fields. The fix accumulates bare patterns into the
   implicit `child` field and keeps parsing.

Each named field independently selects its target field for matching, so
the source-order of fields in the query is purely cosmetic and intermixing
is safe.

Add tests covering parenthesized capture, bare-literal capture, and the
named-vs-any distinction between `(_)` and bare `_`. Update query-syntax
docs to reflect all three.
Schema::from_language registered unnamed kinds via or_insert(id), where
`id` came from iterating 0..node_kind_count. For names with multiple
unnamed IDs (notably "end" in tree-sitter-ruby has IDs 0 and 13, where
ID 0 is the reserved error token), this picked the first encountered
ID — typically the wrong one.

The visitor sets node.kind via language.id_for_node_kind(name, false),
which returns the canonical ID. So a query for ("end") would compare
node.kind=13 against schema=0 and silently fail to match, with no
diagnostic.

Use language.id_for_node_kind(name, false) to obtain the canonical ID
when registering, mirroring the named-kind path that already does the
same with id_for_node_kind(name, true).
Previously, a bare child pattern in a query took whatever the next
child of the iterator was and either matched or failed: it would not
scan ahead to find a match. So `(foo ("baz"))` against a `foo` whose
implicit `child` field was `["bar", "baz"]` would fail (the pattern
took "bar" first).

Switch to forward-scan semantics: a SingleNode matcher advances through
the iterator until it finds a child that matches its sub-query. Patterns
that are named-only continue to skip past unnamed children for free.
Order is preserved across multiple bare patterns at the same level —
each pattern advances the shared iterator past whatever it consumed —
so a query cannot match children out of source order.

Captures from a failed match attempt are rolled back via a snapshot, so
partial captures from a complex sub-query do not leak across attempts.

Add two regression tests against the `do` body wrapper in a Ruby
for-loop, whose implicit `child` field contains [do, identifier, end]:
- a query for ("end") matches by skipping past `do` and the identifier
- a query for ("end") then ("do") fails, demonstrating order preservation
@tausbn tausbn force-pushed the tausbn/yeast-forward-scan-queries branch from 16f0c7a to af6e921 Compare May 7, 2026 15:12
@tausbn
Copy link
Copy Markdown
Contributor Author

tausbn commented May 7, 2026

Rerun has been triggered: 2 restarted 🚀

Copy link
Copy Markdown
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation no-change-note-required This PR does not need a change note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants